Aggression and Complexity in Trump’s 2020 Rhetoric

An Analysis of 2020 Presidential Campaign Speeches

Kayla Muller

2025-05-06

Table of Contents

  • Introduction
  • Data
  • Aggression Time Trend
  • Simplicity Time Trend
  • Topic Modeling: Aggression
  • Topic Modeling: Simplicity
  • Conclusion

Introduction

Research Question:

Is there a correlation between aggressivity and rhetorical complexity in Donald Trump’s 2020 presidential campaign speeches?

Data

Chalkiadakis, Ioannis and Anglès d’Auriac, Louise and Peters, Gareth and Frau-Meigs, Divina, A text dataset of campaign speeches of the main tickets in the 2020 US presidential election (September 20, 2024)

  • This analysis uses Trump’s campaign speeches from the 2020 presidential election to assess if, or to what extent, there is a correlation between aggressivity and rhetorical simplicity.
  • The dataset consists of 235 official transcripts of Donald Trump’s speeches throughout his 2020 presidential campaign from January, 2019 through January, 2021.

Monthly Average Aggression Ratio

Visualizing the trend next to a two month rolling average.

Monthly Average Aggression Ratio

Visualizing the trend next to a two month rolling average.

Aggression in the 75th Percentile

Visualizing a subset consisting of 21 of the most aggressive speeches, with a ratio above 0.206258 in the 75th percentile.

Rhetoric Complexity

Monthly Average Flesch Score

Simplicity in the 75th Percentile

Speeches with a flesch_score above 68.72: Trump’s simplest speeches. The subset for the 75th percentile consists of 59 speeches, of 235 total.

Topic Modeling: Aggression in the 75th Percentile

Linear Discriminant Analysis (LDA) to identify the top topics in the 75th percentile of aggressive speeches.

LatentDirichletAllocation(n_components=7, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Topic #1:  know said peopl want dont say thing great think right
Topic #2:  woman nation iran futur countri american terror state continu busi
Topic #3:  race state unit sex order nation american feder act agenc
Topic #4:  countri american border year biden peopl nation america presid want
Topic #5:  iran world unit nation state american china peopl year hong
Topic #6:  thank american america nation great peopl state unit child histori
Topic #7:  divis holocaust appoint th unit woman act secretari crime day

WordCloud Top Topics: Agression

Topic Modeling: Simplicity

LatentDirichletAllocation(n_components=6, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Topic #1:  peopl think good lot thing great number want countri meet
Topic #2:  thank peopl great know said want think countri like say
Topic #3:  know said peopl dont want year laughter say right great
Topic #4:  crowd number great happen mani come know im weve thank
Topic #5:  peac heart brother robert wonder memori tonight live best forev
Topic #6:  presid trump said know want dont peopl year say biden

WordCloud Top Topics: Simplicity

Conclusion

Summary

  • Moderate Inverse Relationship
  • Aggression Topics: Justice and order, fake news, China, and immigration
  • Topics linked to linguistic simplicity: His brother’s passing, patriotism, and policy

Future Research

Larger Dataset

  • I recommend using a more diverse selection of documents (tweets, statements made on social media, and transcriptions of video clips)

  • Trump has been known to make inflammatory remarks about political opponents on social media, and this would be a more precise avenue to pursue a deeper analysis.

Appendix

Monthly Average Aggression Ratio

american_words = [
    "abuse", "abysmal", "accusation", "accusations", "accuse", "accusing", "adversarial",
    "aggressive", "anger", "angered", "annoyance", "annoyed", "annoying", "antagonistic",
    "antagonize", "appalling", "archaic", "arrogance", "arrogant", "ashamed", "assault",
    "assaulted", "assaulting", "attacking", "atrocious", "backtalk", "bitter", "bitterly",
    "bitterness", "blackened", "blackmail", "blame", "blamed", "blaming", "blunder", "bogus",
    "botch", "botched", "betray", "betrayed", "betrayal", "clownery", "chaos", "chaotic",
    "complain", "complaining", "condemn", "confront", "confrontation", "confrontational",
    "crass", "coward", "cowardly", "criticize", "criticized", "criticizing", "cruel", "cruelty",
    "debase", "debased", "deceit", "deceived", "deceive", "deception", "devious", "deviousness",
    "despicable", "disgrace", "disgraceful", "disgusting", "dishonest", "dishonorable",
    "disregard", "disreputable", "distasteful", "dodgy", "dull", "embarrass", "embarrassing",
    "embarrassment", "fabricator", "fail", "failed", "failure", "failures", "faithless", "farcical",
    "fiasco", "fibber", "fiddle", "fiddled", "fool", "foolish", "fraud", "fraudulence",
    "fraudulent", "furious", "gimmick", "good-for-nothing", "groan", "grotesque", "hackery",
    "half-truths", "hate", "hatred", "hodgepodge", "horrendous", "hostile", "hostility",
    "humiliate", "humiliating", "hypocrisy", "hypocrite", "idiot", "idiotic", "ignorance",
    "ignorant", "ill-judged", "ill-mannered", "immoral", "inadequacy", "incapable", "inferior",
    "insult", "insulted", "insulting", "intolerant", "ironic", "irony", "irritated", "jumble",
    "laughable", "lawbreakers", "leech", "libelous", "ludicrous", "mess", "misbehave", "mischief",
    "mischievous", "mislead", "misleading", "needless", "needlessly", "neglect", "neglected",
    "neglectful", "negligent", "nonsense", "nonsensical", "nasty", "obnoxious", "offend",
    "offenders", "outrageous", "outraged", "patronize", "patronizing", "petty", "penny-pinching",
    "phony", "petulant", "prejudice", "prejudices", "predictable", "problematic", "provoke",
    "provoked", "ridicule", "ridiculous", "reprehensible", "rude", "scandal", "scandalous",
    "scapegoat", "scapegoats", "scaremonger", "scaremongering", "shady", "shameful", "shambles",
    "sham", "shenanigans", "short-sighted", "silly", "silliness", "slander", "slanderous",
    "sleaze", "sleazy", "sly", "slyness", "smokescreen", "sneaky", "spite", "spiteful", "steal",
    "stereotyping", "stubborn", "stupid", "stupidity", "subterfuge", "swindling", "tactic",
    "talking back", "trick", "trickery", "unacceptable", "unhelpful", "unnatural", "untrue",
    "undermine", "outrageous", "vindictive", "villain", "woeful", "wrong"
]
import pandas as pd
import json

# Path to your file
file_path = '/Users/KaylaMuller/desktop/text_analysis/week12/cleantext_DonaldTrump.jsonl.txt'

# Read the file line by line and parse each line as JSON
data = []
with open(file_path, 'r', encoding='utf-8') as f:
    for line in f:
        data.append(json.loads(line))

# Turn into a DataFrame
Trumpdf = pd.DataFrame(data)
import pandas as pd
import re

# Make sure your list of words is defined
word_list = set(american_words)  

# Compile a regex pattern that matches any of the words, word-boundary safe
pattern = re.compile(r'\b(' + '|'.join(re.escape(word) for word in word_list) + r')\b', re.IGNORECASE)

# Apply a function to count matches in each row
Trumpdf["NegativeWordCount"] = Trumpdf["CleanText"].astype(str).apply(lambda text: len(pattern.findall(text)))
Trumpdf["TotalWordCount"] = Trumpdf["CleanText"].astype(str).apply(lambda text: len(re.findall(r'\b\w+\b', text)))
Trumpdf["neg_ratio"] = Trumpdf["NegativeWordCount"] / Trumpdf["TotalWordCount"] * 100

# Ensure the 'Date' column is in datetime format
Trumpdf["Date"] = pd.to_datetime(Trumpdf["Date"], errors="coerce")

# Drop rows where 'Date' is NaT (invalid dates)
Trumpdf = Trumpdf.dropna(subset=["Date"])

# Extract YearMonth in string format (YYYY-MM) for easier handling in ggplot
Trumpdf["YearMonth"] = Trumpdf["Date"].dt.to_period('M').astype(str)

# Calculate the average 'neg_ratio' by 'YearMonth'
monthly_avg_neg_ratio = Trumpdf.groupby("YearMonth")["neg_ratio"].mean().reset_index()

# Export the result to CSV for use in R
monthly_avg_neg_ratio.to_csv("monthly_avg_neg_ratio.csv", index=False)
library(reticulate)
library(ggplot2)

# Load the CSV file (make sure you have the correct path to the file)
df <- read.csv("monthly_avg_neg_ratio.csv")

# Convert 'YearMonth' to a date format
df$YearMonth <- as.Date(paste0(df$YearMonth, "-01"))

# Plot the data
ggplot(df, aes(x = YearMonth, y = neg_ratio)) +
  geom_line() +
  labs(title = "Monthly Average Aggression Ratio", x = "Month", y = "Aggression Ratio (%)") +
  theme_minimal()
# Sort by 'YearMonth' to ensure the rolling average works correctly
monthly_avg_neg_ratio = monthly_avg_neg_ratio.sort_values("YearMonth")

# Calculate the two-month rolling average of 'neg_ratio'
monthly_avg_neg_ratio["TwoMonthRollingAvg"] = monthly_avg_neg_ratio["neg_ratio"].rolling(window=2).mean()

# Export the result to CSV for use in R
monthly_avg_neg_ratio.to_csv("monthly_avg_neg_ratio_with_rolling_avg.csv", index=False)
library(ggplot2)
library(readr)
library(dplyr)

# Read the data
monthly_avg_neg_ratio <- read_csv("monthly_avg_neg_ratio_with_rolling_avg.csv")

# Convert YearMonth to Date type
monthly_avg_neg_ratio <- monthly_avg_neg_ratio %>%
  mutate(Date = as.Date(paste0(YearMonth, "-01")))

# Plot with ggplot
ggplot(monthly_avg_neg_ratio, aes(x = Date)) +
  geom_line(aes(y = neg_ratio), color = "blue", linetype = "dashed", size = 1) +
  geom_line(aes(y = TwoMonthRollingAvg), color = "red", size = 1) +
  labs(title = "Monthly Negative Ratio with Two-Month Rolling Average",
       x = "Date",
       y = "Negative Ratio (%)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_x_date(date_labels = "%Y-%m", date_breaks = "1 month")

Analysis of Aggression in the 75th Percentile

# Subset the DataFrame to select only rows where 'neg_ratio' > 0.206258
subset_df = Trumpdf[Trumpdf["neg_ratio"] > 0.206258]

# Calculate the average 'neg_ratio' by 'YearMonth'
subset_monthly_avg_neg_ratio = subset_df.groupby("YearMonth")["neg_ratio"].mean().reset_index()

# Export the result to CSV for use in R
subset_monthly_avg_neg_ratio.to_csv("monthly_avg_neg_ratio.csv", index=False)
library(reticulate)
library(ggplot2)

# Load the CSV file (make sure you have the correct path to the file)
df_with_subset <- read.csv("monthly_avg_neg_ratio.csv")

# Convert 'YearMonth' to a date format
df_with_subset$YearMonth <- as.Date(paste0(df_with_subset$YearMonth, "-01"))

# Plot the data
ggplot(df_with_subset, aes(x = YearMonth, y = neg_ratio)) +
  geom_line() +
  labs(title = "Monthly Average Aggression Ratio for the 75th percentile", x = "Month", y = "Aggression Ratio (%)") +
  theme_minimal()

Monthly Average Flesch Score

from textstat import flesch_reading_ease

Trumpdf['flesch_score'] = Trumpdf['CleanText'].apply(flesch_reading_ease)

# Calculate the average 'flesch_score' by 'YearMonth'
monthly_avg_flesch_score = Trumpdf.groupby("YearMonth")["flesch_score"].mean()

# Export the result to CSV for use in R
monthly_avg_flesch_score.to_csv("monthly_avg_flesch_score.csv", index=True)
library(ggplot2)
library(readr)
library(dplyr)

# Read the data
monthly_avg_flesch_score <- read_csv("monthly_avg_flesch_score.csv")

# Convert YearMonth to Date type
monthly_avg_flesch_score <- monthly_avg_flesch_score %>%
  mutate(Date = as.Date(paste0(YearMonth, "-01")))

# Plot with ggplot
ggplot(monthly_avg_flesch_score, aes(x = Date)) +
  geom_line(aes(y = flesch_score), color = "blue", linetype = "dashed", size = 1) +
  geom_line(aes(y = flesch_score), color = "red", size = 1) +
  labs(title = "Monthly Average Flesch Score",
       x = "Date",
       y = "Flesch Score") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_x_date(date_labels = "%Y-%m", date_breaks = "1 month")

Analysis of Flesch Score Above the 75th Percentile

# Subset the DataFrame to select only rows where 'flesch_score' > 68.72
subset_df_flesch_score = Trumpdf[Trumpdf["flesch_score"] > 68.72]

# Calculate the average 'neg_ratio' by 'YearMonth'
subset_monthly_avg_flesch_score = subset_df_flesch_score.groupby("YearMonth")["flesch_score"].mean().reset_index()

# Export the result to CSV for use in R
subset_monthly_avg_flesch_score.to_csv("subset_monthly_avg_flesch_score.csv", index=True)
library(ggplot2)
library(readr)
library(dplyr)

# Read the data
subset_monthly_avg_flesch_score <- read_csv("/Users/KaylaMuller/Desktop/text_analysis/week12/subset_monthly_avg_flesch_score.csv")

# Convert YearMonth to Date type
subset_monthly_avg_flesch_score <- subset_monthly_avg_flesch_score %>%
  mutate(Date = as.Date(paste0(YearMonth, "-01")))

# Plot with ggplot
ggplot(subset_monthly_avg_flesch_score, aes(x = Date)) +
  geom_line(aes(y = flesch_score), color = "blue", linetype = "dashed", size = 1) +
  geom_line(aes(y = flesch_score), color = "red", size = 1) +
  labs(title = "Monthly Average Flesch Score for the 75th Percentile",
       x = "Date",
       y = "Flesch Score") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_x_date(date_labels = "%Y-%m", date_breaks = "1 month")

Topic Modeling: Aggression in the 75th Percentile

import string
import re
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer

# Step 0: Optional — Make a copy to avoid SettingWithCopyWarning
subset_df = subset_df.copy()

# Setup
stop = set(stopwords.words('english'))
stop.add('applause')  # custom stopword
lemmatizer = WordNetLemmatizer()
stemmer = PorterStemmer()

# Combined cleaning function
def clean_text(text):
    text = text.lower()  # lowercase
    text = text.translate(str.maketrans('', '', string.punctuation))  # remove punctuation
    text = re.sub(r'\d+', '', text)  # remove numbers
    tokens = word_tokenize(text)  # tokenize
    tokens = [word for word in tokens if word not in stop]  # remove stopwords
    tokens = [lemmatizer.lemmatize(word) for word in tokens]  # lemmatization
    tokens = [stemmer.stem(word) for word in tokens]  # stemming
    return ' '.join(tokens)

# Apply to DataFrame
subset_df['CleanText_transformed'] = subset_df['CleanText'].apply(clean_text)
# Vectorize
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(max_df=0.9, min_df=2, stop_words='english')  # stop_words optional now
dtm = vectorizer.fit_transform(subset_df['CleanText_transformed'])
from sklearn.decomposition import LatentDirichletAllocation

lda = LatentDirichletAllocation(n_components=7, random_state=42)  # change 5 to desired topic count
lda.fit(dtm)
def display_topics(model, feature_names, num_top_words):
    for idx, topic in enumerate(model.components_):
        print(f"Topic #{idx + 1}: ", " ".join([feature_names[i] for i in topic.argsort()[:-num_top_words - 1:-1]]))
        
display_topics(lda, vectorizer.get_feature_names_out(), 10)
topic_results = lda.transform(dtm)
subset_df['DominantTopic'] = topic_results.argmax(axis=1)

WordClouds Representing Top Topics: AGGRESSION

from wordcloud import WordCloud
import matplotlib.pyplot as plt

# Get the feature names (words)
feature_names = vectorizer.get_feature_names_out()

# Loop over each topic
for topic_idx, topic_weights in enumerate(lda.components_):
    # Create dictionary: word -> weight
    word_freq = {feature_names[i]: topic_weights[i] for i in topic_weights.argsort()[:-31:-1]}  # top 30 words
    
    # Generate the word cloud
    wordcloud = WordCloud(width=800, height=400, background_color='white').generate_from_frequencies(word_freq)
    
    # Plot the word cloud
    plt.figure(figsize=(10, 5))
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis("off")
    plt.title(f"Topic #{topic_idx + 1}")
    plt.show()

Topic Modeling: Simplicity in the 75th Percentile

import string
import re
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer

# Step 0: Optional — Make a copy to avoid SettingWithCopyWarning
subset_df_flesch_score = subset_df_flesch_score.copy()

# Setup
stop = set(stopwords.words('english'))
stop.add('applause')  # custom stopword
lemmatizer = WordNetLemmatizer()
stemmer = PorterStemmer()

# Combined cleaning function
def clean_text(text):
    text = text.lower()  # lowercase
    text = text.translate(str.maketrans('', '', string.punctuation))  # remove punctuation
    text = re.sub(r'\d+', '', text)  # remove numbers
    tokens = word_tokenize(text)  # tokenize
    tokens = [word for word in tokens if word not in stop]  # remove stopwords
    tokens = [lemmatizer.lemmatize(word) for word in tokens]  # lemmatization
    tokens = [stemmer.stem(word) for word in tokens]  # stemming
    return ' '.join(tokens)

# Apply to DataFrame
subset_df_flesch_score['CleanText_transformed'] = subset_df_flesch_score['CleanText'].apply(clean_text)
# Vectorize
from sklearn.feature_extraction.text import CountVectorizer

vectorizer2 = CountVectorizer(max_df=0.9, min_df=2, stop_words='english')  # stop_words optional now
dtm2 = vectorizer.fit_transform(subset_df_flesch_score['CleanText_transformed'])
from sklearn.decomposition import LatentDirichletAllocation

lda2 = LatentDirichletAllocation(n_components=6, random_state=42)  # change 5 to desired topic count
lda2.fit(dtm2)
def display_topics(model, feature_names, num_top_words):
    for idx, topic in enumerate(model.components_):
        print(f"Topic #{idx + 1}: ", " ".join([feature_names[i] for i in topic.argsort()[:-num_top_words - 1:-1]]))
        
display_topics(lda2, vectorizer.get_feature_names_out(), 10)
topic_results2 = lda2.transform(dtm2)
subset_df_flesch_score['DominantTopic'] = topic_results.argmax(axis=1)

WordClouds Representing Top Topics: SIMPLICITY

from wordcloud import WordCloud
import matplotlib.pyplot as plt

# Get the feature names (words)
feature_names = vectorizer.get_feature_names_out()

# Loop over each topic
for topic_idx, topic_weights in enumerate(lda2.components_):
    # Create dictionary: word -> weight
    word_freq = {feature_names[i]: topic_weights[i] for i in topic_weights.argsort()[:-31:-1]}  # top 30 words
    
    # Generate the word cloud
    wordcloud = WordCloud(width=800, height=400, background_color='white').generate_from_frequencies(word_freq)
    
    # Plot the word cloud
    plt.figure(figsize=(10, 5))
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis("off")
    plt.title(f"Topic #{topic_idx + 1}")
    plt.show()